skip to main content


Search for: All records

Creators/Authors contains: "Yang, Yifan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Deep learning has become a popular tool for computer-aided diagnosis using medical images, sometimes matching or exceeding the performance of clinicians. However, these models can also reflect and amplify human bias, potentially resulting inaccurate missed diagnoses. Despite this concern, the problem of improving model fairness in medical image classification by deep learning has yet to be fully studied. To address this issue, we propose an algorithm that leverages the marginal pairwise equal opportunity to reduce bias in medical image classification. Our evaluations across four tasks using four independent large-scale cohorts demonstrate that our proposed algorithm not only improves fairness in individual and intersectional subgroups but also maintains overall performance. Specifically, the relative change in pairwise fairness difference between our proposed model and the baseline model was reduced by over 35%, while the relative change in AUC value was typically within 1%. By reducing the bias generated by deep learning models, our proposed approach can potentially alleviate concerns about the fairness and reliability of image-based computer-aided diagnosis.

     
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  2. Abstract

    Single crystalline BaMnSb2is considered as a 3D Weyl semimetal with the 2D electronic structure containing Dirac cones from the Sb sheet. We report experimental investigation of low-temperature cleaved BaMnSb2surfaces using scanning tunneling microscopy/spectroscopy and low energy electron diffraction. By natural cleavage, we find two terminations: one is Ba (above the orthorhombically distorted Sb sheet) and another Sb2 (at the surface of the Sb/Mn/Sb sandwich layer). Both terminations show the 2 × 1 surface reconstructions, with drastically different morphologies and electronic properties, however. The reconstructed structures, defect types and nature of the electronic structures of the two terminations are extensively studied. The quasiparticle interference (QPI) analysis is conducted at the energy range between −2 V and 2 V, although no interesting states are observed near the Fermi level, the surface-projected electronic band structures strongly depend on the surface termination above 1.6 V. The existence of defects can greatly modify the local density of states to create electronic phase separations on the surface in the order of tens of nm scale. Our observation on the atomic structures of the terminations and the corresponding electronic structures provides critical information towards an understanding of topological properties of BaMnSb2.

     
    more » « less
  3. The Togashi Kaneko model (TK model) is a simple stochastic reaction network that displays discreteness-induced transitions between meta-stable patterns. Here we study a constrained Langevin approximation (CLA) of this model. This CLA, derived under the classical scaling, is an obliquely reflected diffusion process on the positive orthant and hence respects the constraint that chemical concentrations are never negative. We show that the CLA is a Feller process, is positive Harris recurrent and converges exponentially fast to the unique stationary distribution. We also characterize the stationary distribution and show that it has finite moments. In addition, we simulate both the TK model and its CLA in various dimensions. For example, we describe how the TK model switches between meta-stable patterns in dimension six. Our simulations suggest that, when the volume of the vessel in which all of the reactions that take place is large, the CLA is a good approximation of the TK model in terms of both the stationary distribution and the transition times between patterns.

     
    more » « less
  4. null (Ed.)
    Abstract The COVID-19 outbreak is a global pandemic declared by the World Health Organization, with rapidly increasing cases in most countries. A wide range of research is urgently needed for understanding the COVID-19 pandemic, such as transmissibility, geographic spreading, risk factors for infections, and economic impacts. Reliable data archive and sharing are essential to jump-start innovative research to combat COVID-19. This research is a collaborative and innovative effort in building such an archive, including the collection of various data resources relevant to COVID-19 research, such as daily cases, social media, population mobility, health facilities, climate, socioeconomic data, research articles, policy and regulation, and global news. Due to the heterogeneity between data sources, our effort also includes processing and integrating different datasets based on GIS (Geographic Information System) base maps to make them relatable and comparable. To keep the data files permanent, we published all open data to the Harvard Dataverse ( https://dataverse.harvard.edu/dataverse/2019ncov ), an online data management and sharing platform with a permanent Digital Object Identifier number for each dataset. Finally, preliminary studies are conducted based on the shared COVID-19 datasets and revealed different spatial transmission patterns among mainland China, Italy, and the United States. 
    more » « less
  5. null (Ed.)
  6. Abstract Motivation

    High-throughput mRNA sequencing (RNA-Seq) is a powerful tool for quantifying gene expression. Identification of transcript isoforms that are differentially expressed in different conditions, such as in patients and healthy subjects, can provide insights into the molecular basis of diseases. Current transcript quantification approaches, however, do not take advantage of the shared information in the biological replicates, potentially decreasing sensitivity and accuracy.

    Results

    We present a novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicates representing two conditions, e.g. multiple samples from healthy and diseased subjects. DEIsoM first estimates isoform expression within each condition by (1) capturing common patterns from sample replicates while allowing individual differences, and (2) modeling the uncertainty introduced by ambiguous read mapping in each replicate. Specifically, we introduce a Dirichlet prior distribution to capture the common expression pattern of replicates from the same condition, and treat the isoform expression of individual replicates as samples from this distribution. Ambiguous read mapping is modeled as a multinomial distribution, and ambiguous reads are assigned to the most probable isoform in each replicate. Additionally, DEIsoM couples an efficient variational inference and a post-analysis method to improve the accuracy and speed of identification of DE isoforms over alternative methods. Application of DEIsoM to an hepatocellular carcinoma (HCC) dataset identifies biologically relevant DE isoforms. The relevance of these genes/isoforms to HCC are supported by principal component analysis (PCA), read coverage visualization, and the biological literature.

    Availability and implementation

    The software is available at https://github.com/hao-peng/DEIsoM

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less